Parallelization of Saprse Cholesky Factorization on an SMP Cluster
نویسندگان
چکیده
In this paper, we present parallel implementations of the sparse Cholesky factorization kernel in the SPLASH-2 programs to evaluate performance of a Pentium Pro based SMP cluster. Solaris threads and remote memory operations are utilized for intranode parallelism and internode communications, respectively. Sparse Cholesky factorization is a typical irregular application with a high communication to computation ratio and no global synchronization between steps. We e ciently parallelized using asynchronous message handling instead of lock-based mutual exclusion between nodes, because synchronization between nodes reduces the performance signi cantly. We also found that the mapping of processes to processors on an SMP cluster a ects the performance especially when the communication latency can not be hidden.
منابع مشابه
Experiments with Cholesky Factorization on Clusters of SMPs
Cholesky factorization of large dense matrices is an integral part of many applications in science and engineering. In this paper we report on experiments with different parallel versions of Cholesky factorization on modern high-performance computing architectures. For the parallelization of Cholesky factorization we utilized various standard linear algebra software packages and present perform...
متن کاملImplementing a parallel matrix factorization library on the cell broadband engine
Matrix factorization (or often called decomposition) is a frequently used kernel in a large number of applications ranging from linear solvers to data clustering and machine learning. The central contribution of this paper is a thorough performance study of four popular matrix factorization techniques, namely, LU, Cholesky, QR, and SVD on the STI Cell broadband engine. The paper explores algori...
متن کاملCholesky Factorization of Band Matrices Using Multithreaded BLAS
In this paper we analyze the efficacy of the LAPACK blocked routine for the Cholesky factorization of symmetric positive definite band matrices on Intel SMP platforms using two multithreaded implementations of BLAS. We also propose strategies that alleviate some of the performance degradation that is observed, and which is basically due to the use of multiple threads when dealing with problems ...
متن کاملAn Algorithm-by-Blocks for SuperMatrix Band Cholesky Factorization
We pursue the scalable parallel implementation of the factorization of band matrices with medium to large bandwidth targeting SMP and multi-core architectures. Our approach decomposes the computation into a large number of fine-grained operations exposing a higher degree of parallelism. The SuperMatrix run-time system allows an out-of-order scheduling of operations that is transparent to the pr...
متن کاملHigh Performance Cholesky Factorization via Blocking and Recursion That Uses Minimal Storage
We present a high performance Cholesky factorization algorithm , called BPC for Blocked Packed Cholesky, which performs better or equivalent to the LAPACK DPOTRF subroutine, but with about the same memory requirements as the LAPACK DPPTRF subroutine, which runs at level 2 BLAS speed. Algorithm BPC only calls DGEMM and level 3 kernel routines. It combines a recursive algorithm with blocking and ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 1999